In-class Exercise 06

ISSS608 Visual Analytics In-class Exercise 06

Jiarui Cui www.linkedin.com/in/jiarui-cui-482232195 (School of Computing and Information Systems)https://scis.smu.edu.sg/
2022-05-29

In this Hands-on Exercise, the following R packages will be used: sf, an R package specially designed to handle geospatial data in simple feature objects.

packages = c('lubridate', 'tidyverse', 'readr',
             'tmap','sf','sftime','clock','rmarkdown')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

In the code chunk below, read_sf() of sf package is used to parse School.csv into R as an sf data.frame and parses Pubs.csv, Apartments.csv, Buildings.csv, Employer.csv, and Restaurants.csv into R

schools <- read_sf("data/wkt/Schools.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
buildings <- read_sf("data/wkt/Buildings.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
apartments <- read_sf("data/wkt/Apartments.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
employers <- read_sf("data/wkt/Employers.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
pubs <- read_sf("data/wkt/Pubs.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")
restaurants <- read_sf("data/wkt/Restaurants.csv", 
                   options = "GEOM_POSSIBLE_NAMES=location")

After importing the data file into R, it is important for us to review the data object.

print(buildings)
Simple feature collection with 1042 features and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -4762.191 ymin: -30.08359 xmax: 2650 ymax: 7850.037
CRS:           NA
# A tibble: 1,042 × 5
   buildingId                       location buildingType maxOccupancy
   <chr>                           <POLYGON> <chr>        <chr>       
 1 1          ((350.0639 4595.666, 390.0633… Commercial   ""          
 2 2          ((-1926.973 2725.611, -1948.1… Residental   "12"        
 3 3          ((685.6846 1552.131, 645.9985… Commercial   ""          
 4 4          ((-976.7845 4542.382, -1053.2… Commercial   ""          
 5 5          ((1259.306 3572.727, 1299.255… Residental   "2"         
 6 6          ((478.8969 1082.484, 473.6596… Commercial   ""          
 7 7          ((-1920.823 615.7447, -1960.8… Residental   ""          
 8 8          ((-3302.657 5394.354, -3301.5… Commercial   ""          
 9 9          ((-600.5789 4429.228, -495.95… Commercial   ""          
10 10         ((-68.75908 5379.924, -28.782… Residental   "5"         
# … with 1,032 more rows, and 1 more variable: units <chr>

The code chunk below plots the building polygon features by using tm_polygon().

tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1)
tmap_mode("plot")

The code chunk below is used to plot a composite map by combining the buildings and employers simple feature data.frames.

tmap_mode("plot") # the sequence is important - depends on the layers
tm_shape(buildings)+ # read to get the data
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1) +
tm_shape(employers) +
  tm_dots(col = "red")

Importing wkt data

logs <- read_sf("data/wkt/ParticipantStatusLogs1.csv", 
                options = "GEOM_POSSIBLE_NAMES=currentLocation")

Processing movement data

To process the movement data, the following steps will be performed:

convert timestamp field from character data type to date-time data type by using date_time_parse() of clock package. derive a day field by using get_day() of clock package. extract records whereby currentMode field is equal to Transport class by using filter() of dplyr package.

logs_selected <- logs %>%
  mutate(Timestamp = date_time_parse(timestamp,
                                     zone = "",
                                     format = "%Y-%m-%dT%H:%M:%S")) %>%
  mutate(day = get_day(Timestamp))%>%
  filter(currentMode == "Transport")
write_rds(logs_selected,"data/rds/logs_selected.rds")

In the code chunk below, st_make_grid() of sf package is used to create haxegons

hex <- st_make_grid(buildings, 
                    cellsize=100, 
                    square=FALSE) %>%
  st_sf() %>%
  rowid_to_column('hex_id')
plot(hex)

The code chunk below perform point in polygon overlay by using [st_join()] of sf package.

points_in_hex <- st_join(logs_selected, 
                         hex, 
                         join=st_within)
#plot(points_in_hex, pch='.')

In the code chunk below, st_join() of sf package is used to count the number of event points in the hexagons.

points_in_hex <- st_join(logs_selected, 
                        hex, 
                        join=st_within) %>%
  st_set_geometry(NULL) %>%
  count(name='pointCount', hex_id)
head(points_in_hex)
# A tibble: 6 × 2
  hex_id pointCount
   <int>      <int>
1    169         35
2    212         56
3    225         21
4    226         94
5    227         22
6    228         45

In the code chunk below, left_join() of dplyr package is used to perform a left-join by using hex as the target table and points_in_hex as the join table. The join ID is hex_id.

hex_combined <- hex %>%
  left_join(points_in_hex, 
            by = 'hex_id') %>%
  replace(is.na(.), 0)

In the code chunk below, tmap package is used to create the hexagon binning map.

tm_shape(hex_combined %>%
           filter(pointCount > 0))+
  tm_fill("pointCount",
          n = 8,
          style = "quantile") +
  tm_borders(alpha = 0.1)

Code chunk below joins the event points into movement paths by using the participants’ IDs as unique identifiers.

logs_path <- logs_selected %>%
  group_by(participantId, day) %>%
  summarize(m = mean(Timestamp), 
            do_union=FALSE) %>%
  st_cast("LINESTRING")
print(logs_path)
Simple feature collection with 5781 features and 3 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: -4616.828 ymin: 35.4377 xmax: 2630 ymax: 7836.546
CRS:           NA
# A tibble: 5,781 × 4
# Groups:   participantId [1,011]
   participantId   day m                               currentLocation
   <chr>         <int> <dttm>                             <LINESTRING>
 1 0                 1 2022-03-01 13:34:23 (-2721.353 6862.861, -2689…
 2 0                 2 2022-03-02 14:19:50 (-2721.353 6862.861, -2689…
 3 0                 3 2022-03-03 13:39:13 (-2721.353 6862.861, -2689…
 4 0                 4 2022-03-04 13:38:11 (-2721.353 6862.861, -2689…
 5 0                 5 2022-03-05 13:08:02 (-2721.353 6862.861, -2689…
 6 0                 6 2022-03-06 06:28:00 (-2721.353 6862.861, -2689…
 7 1                 1 2022-03-01 18:07:24 (-1531.133 5597.244, -1863…
 8 1                 2 2022-03-02 16:57:05 (-2619.036 5860.49, -2200.…
 9 1                 3 2022-03-03 14:13:40 (-260.4575 5026.151, -352.…
10 1                 4 2022-03-04 14:31:45 (-3903.194 5967.837, -3655…
# … with 5,771 more rows
logs_path_selected <- logs_path %>%
  filter(participantId==4)
tmap_mode("view")
tm_shape(buildings)+
tm_polygons(col = "grey60",
           size = 1,
           border.col = "black",
           border.lwd = 1) +
  tm_shape(logs_path_selected)+
  tm_lines(col = "blue")
tmap_mode("plot")